Fast & Confident Probabilistic Categorisation

نویسنده

  • Cyril Goutte
چکیده

We describe NRC’s submission to the Anomaly Detection/Text Mining competition organised at the Text Mining Workshop 2007. This submission relies on a straightforward implementation of the probabilistic categoriser described in [4]. This categoriser is adapted to handle multiple labelling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labelling confidence. This technique achieves a score of 1.689 on the test data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Probabilistic Model for Fast and Confident*

Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged.

متن کامل

A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation

The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that having been very successful in standard topic-based classification pro...

متن کامل

Probabilistic Models for Hierarchical Clustering and Categorisation: Applications in the Information Society

|We propose a new hierarchical generative model for textual data, where words may be generated by topic speciic distributions at any level in the hierarchy. This model is naturally well-suited to clustering documents in preset or automatically generated hierarchies, as well as cat-egorising new documents in an existing hierarchy. Furthermore , we present a series of applications that can beneet...

متن کامل

Aeóû Ø Ôôöøññòø Ó Óñôùøøö Ëëëëòòò¸éùùùò Ååöý ² Ï×ø¹ Ðð Óððððð¸íòòúö××øý Ó Äóòòóòº

The automatic categorisation of web documents is becoming crucial for organising the huge amount of information available in the Internet. We are facing a new challenge due to the fact that web documents have a rich structure and are highly heterogeneous. Two ways to respond to this challenge are (1) using a representation of the content of web documents that captures these two characteristics ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007